Inferring transcriptional bursting kinetics from single-cell snapshot data using a generalized telegraph model

Gene expression has inherent stochasticity resulting from transcription's burst manners. Single-cell snapshot data can be exploited to rigorously infer transcriptional burst kinetics, using mathematical models as blueprints. The classical telegraph model (CTM) has been widely used to explain transcriptional bursting with Markovian assumptions. However, growing evidence suggests that the gene-state dwell times are generally non-exponential, as gene-state switching is a multi-step process in organisms. Therefore, interpretable non-Markovian mathematical models and efficient statistical inference methods are urgently required in investigating transcriptional burst kinetics. We develop an interpretable and tractable model, the generalized telegraph model (GTM), to characterize transcriptional bursting that allows arbitrary dwell-time distributions, rather than exponential distributions, to be incorporated into the ON and OFF switching process. Based on the GTM, we propose an inference method for transcriptional bursting kinetics using an approximate Bayesian computation framework. This method demonstrates an efficient and scalable estimation of burst frequency and burst size on synthetic data. Further, the application of inference to genome-wide data from mouse embryonic fibroblasts reveals that GTM would estimate lower burst frequency and higher burst size than those estimated by CTM. In conclusion, the GTM and the corresponding inference method are effective tools to infer dynamic transcriptional bursting from static single-cell snapshot data.


Introduction
Gene expression is a complex biochemical reaction process with inherent stochasticity, leading to cell-tocell variability in messenger RNA (mRNA) abundance [1]. Many experimental studies have shown that, in both prokaryotic and eukaryotic cells, the expression of most genes exhibits a stochastic burst pattern over time, characterized by silent intervals interspersed between transcriptional events of genes [2,3]. Their burst kinetics, described by burst frequency and burst size [4,5], are closely related to the whole molecular processes of transcriptional regulation, but the mechanism is not clear. One crucial question is how to learn and infer interpretive biological mechanisms from extensive experimental data, thus bridging the disconnect between transcriptional bursts and their underlying molecular processes, which are crucial for understanding cell-fate decisions [6,7].
Addressing these questions requires visualizing transcription and measuring burst kinetics directly. A growing number of single-molecule experiments have dynamically highlighted transcriptional burst events. MS2 and PP7 imaging systems allow directly detecting the in vivo time-resolved RNA fluorescence of different genes within the same cell, revealing real-time dynamic transcriptional bursts in living cells [8][9][10][11]. Single-molecule fluorescence in situ hybridization (smFISH) [12,13] quantifies the steady-state distributions of RNA in thousands of fixed single cells, from which burst parameters such as burst frequency and burst size can be inferred. However, studies based on these experimental approaches are limited to a few genes, and the burst kinetics cannot be generalized to a genome-wide perspective. Recently, single-cell RNA sequencing (scRNA-seq) [14,15] has revolutionized our understanding of cell-fate decisions and made it possible to infer the dynamic behaviour of each gene from static expression distributions. To fulfil the promise of these scRNA-seq technologies, it will be crucial that mathematical models and computational methods are available to unambiguously reveal general principles of transcription on a genome-wide scale [4].
In principle, models of gene transcription should satisfy two basic requirements. First, the gene expression model should be interpretable and mechanistic. That is, the model can offer a way to understand the mechanisms behind transcriptional bursts-for example, addressing questions such as 'how do the silent transcription intervals control transcriptional bursts?', or 'how transcriptional bursting relates to gene regulation? '. Second, the model should be tractable. Tractability means the model can be analysed mathematically and used to infer transcriptional bursting kinetics for large datasets. The classical telegraph model (CTM) [16], the first rigorous mathematical treatment, connects transcription burst to stochastic gene expression. In this model, the gene switches randomly between active (ON) and inactive (OFF) states, with only the former permitting transcription initiation. The CTM has been applied in the genome-wide inference of burst kinetics from scRNA-seq [17][18][19][20][21][22]. For example, the inferences provided transcriptome-wide evidence that promoter elements affect burst size and enhancers control burst frequencies [17].
Despite the widespread use of the CTM, the model's basic assumption-gene switching between active and inactive states at constant rates-does not always hold in some specific biological systems [5,23,24]. Mathematically, the CTM is based on the Markovian assumption that all the biochemical reaction rates involved are constant, which implies that the dwell time in each state follows an exponential distribution [25]. However, most genes have complex control processes, such as chromatin opening, recruitment of transcription factors, pre-initiation complex formation, transcription initiation, as well as promoter pause and release [26]. Such processes could generate non-exponential time intervals between transcription windows in some genes or cell types. In particular, gene-state switching between active and inactive states is not a single-step manner but a multi-step process [24,27]. The multi-step process including sufficient rate-limiting steps can form a molecular memory between individual biochemical events [28], confirmed by increasing time-resolved biological experimental data [29,30]. Furthermore, this molecular memory can affect transcriptional burst kinetics [31,32].
Modelling, analysing and inferring the molecular memory in gene-state switching is challenging. One possible way is to introduce multiple intermediate states, i.e. the promoter architecture with multiple OFF and ON states [23,33,34]. Although the inclusion of additional gene states can improve the fit between a model and experimental data [35][36][37], the difficulty in determining a possibly large number of promoter states and parameters will be detrimental to the inference of the data [5,[38][39][40]. Even though the topology structure of state-switching networks is simple, the number of transition rate parameters is the same order of the number of promoter states. Thus, it is difficult to handle in practical applications using a multi-state model in the case of numerous promoter states. Alternatively, one can adopt a non-Markovian modelling framework by introducing two general dwell-time distributions for OFF and ON states, respectively. Importantly, the general dwell-time distributions are not limited to the royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 221057 exponential distribution [28,[41][42][43][44][45]. This non-Markovian model has two key advantages. First, the model is built in terms of experimentally measurable quantities and interpretable parameters, rather than unobserved or inconvenient measurement gene states. Second, the model can choose an appropriate distribution with fewer parameters to characterize the transition dynamics between only two gene states and therefore overcomes the difficulty in determining the number of promoter states. Despite the good properties of the non-Markovian model, it remains challenging to derive analytical solutions, develop a practical inference algorithm, and particularly infer bursting kinetics from scRNA-seq data. Our previous study described the gene expression process with the non-exponential dwell time of OFF state and exponential dwell time of ON state using a non-Markovian model [46]. However, there is experimental evidence that the non-exponential waiting time for the ON state is also important [47], and genome-wide inference has not been studied for gene expression models in which both OFF and ON dwell times are non-exponential.
In this study, we develop a statistical inference framework to infer transcriptional bursting kinetics from single-cell snapshot data based on a generalized telegraph model (GTM) we build that extends the traditional exponential dwell-time distributions for ON and OFF states to arbitrary distributions. We solve the model analytically and derive the arbitrarily high-order steady-state binomial moments for mRNAs. Furthermore, we develop a statistical inference method based on approximate Bayesian computation to estimate the burst kinetics of the GTM. As a result, we show that the CTM can be misleading for inferring burst kinetics from simulation data based on the GTM. After the validation of synthetic data, the results with our inference algorithm are accurate and scalable. Finally, we apply this dynamic model and inference method to scRNA-seq data from mouse embryonic fibroblasts and find that GTM which consider molecular memory would estimate lower burst frequency and higher burst size than those estimated by CTM on a genome-wide scale. In conclusion, our study provides a paradigm for inferring the transcriptional bursting kinetics from single-cell snapshot data.

Model description
Transcription occurs predominantly in episodic bursts, characterized by burst frequency and burst size (figure 1a). The CTM is the prevailing model for describing the kinetic behaviour of transcriptional bursts (figure 1b). However, the promoter-state switching involves multiple biochemical reaction processes [48], resulting in the number of effective states of most promoters being greater than 2 (i.e. multiple ON states and OFF states) and diverse switching between states [5,23,33,39] (electronic supplementary material, figure S1). Mapping this complex promoter architecture to the ON-OFF non-Markovian model leads to the ON and OFF dwell times being no longer exponentially distributed, as reported in previous studies [24,27,37,49].
To make this idea precise, we consider a more general stochastic transcription model, called GTM, as illustrated in figure 1c. We assume that the dwell times in OFF and ON states, two random variables τ off and τ on , follow arbitrary probability distributions, denoted by f off (t) and f on (t), rather than the limited exponential distributions. The mRNA synthesis and degradation process are assumed to be Markovian, i.e. exponential distributions of transcription waiting time f syn (t) and mRNA's lifetime f deg (t). Specifically, f syn (t) ¼ r syn e Àrsynt , f deg (t) ¼ r deg e Àr deg t , where r syn and r deg are the mean rate of mRNA synthesis and degradation, respectively. The reaction scheme is summarized by the reaction diagram: OFF À ! ( f off Â f on )(t). Consequently, we can obtain the expression of burst frequency, BF for GTM (see the electronic supplementary material, note S1.1 for a detailed derivation): Burst size describes the average number of mRNA molecules produced per burst. We first derive the distribution of transcription-event numbers per burst. For the exponential transcription process with rate r syn , the probability of the occurrence of x transcription events conditioned on a fixed duration time t of the ON state is a time-dependent Poisson distribution P(X ¼ xjt on ¼ t) ¼ (r syn t) x e Àrsynt =x!. Then the probability of the transcription-event number per ON state period, denoted by P(X ), can be computed by the total probability theorem P(X ¼ . Therefore, we obtain the burst size, BS for GTM by some algebraic calculations (see the electronic supplementary material, note S1.2 for a detailed derivation): BS ¼ r syn t on h i: ð2:3Þ Next, we confirm the necessity of the GTM with the help of the obtained burst kinetics (equations (2.2) and (2.3)) of the GTM. From the principle of parsimony, we know that the burst size and frequency can also be directly computed with the CTM [17][18][19]. A natural and important question is whether the CTM quantitatively enough describes the burst kinetics (burst size and burst frequency). We answer it by performing inference on simulation data from the GTM. First, we repeatedly generate synthetic scRNA-seq data using the simulation algorithm of GTM (see the electronic supplementary material, table S1). In the GTM, molecular memories are characterized by Gamma distributions [31], i.e. f off (t) ¼ r k off off t k off À1 e Àr off t =G(k off ) and f on (t) ¼ r kon on t konÀ1 e Àront =G(k on ) where Γ( · ) is Gamma function, r off and r on are the rate parameters of state switching and k off and k on are the number of possible reaction steps from OFF to ON and vice versa. The Gamma distribution with only two free parameters is widely used to characterize non-exponential dynamics in biophysical reality [29,[50][51][52]. In addition, the Gamma distribution can be used to discern the existence of complex multi-step events in singlemolecule data [53]. Other distributions that may be used are the Weibull distribution and so on. Then, we use the CTM to estimate the burst kinetic parameters of the synthetic data via the maximumlikelihood estimation approach. As a result, we find that although the CTM can well fit the gene expression distributions generated by the GTM (figure 2a,e), it cannot accurately describe the dwelltime distributions (figure 2b,c,f,g), further leading to the erroneous estimations of burst frequency and burst size defined by GTM (figure 2d,h). A recent work [54] has found that the different waiting time distributions between two mRNA production events can generate indistinguishable mRNA distributions. They pointed out that an analytical necessary condition should be satisfied to reduce the  royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 221057 more detailed mechanistic model to the CTM. These results supported our study and suggest that when we use the non-time-resolved single-cell snapshots data to infer burst kinetics, different dwell-time distributions may yield identical gene expression distributions, which leads to misunderstandings of burst kinetics inferred from the CTM. Therefore, it is necessary to develop a scalable statistical inference approach to accurately estimate the burst kinetics from genome-wide scRNA-seq data based on the GTM.

Model analysis
To perform statistical inference on the burst kinetics by a steady gene expression distribution from static scRNA-seq data, we theoretically solve the statistical properties of the transcriptional process described in equation (2.1) using a supplementary variable method [55,56]. Denote by M(t) the number of mRNA molecules at time t and G(t) the gene state at time t. E off (t) (E on (t)) is defined as the elapsed time since the gene switches to the OFF (ON) state at time t, respectively. Then, {M(t), G(t), E off (t), E on (t);t ≥ 0} is a continuous-time Markov process. Let p off (n, τ, t) and p on (n, τ, t) be the probability density functions (PDF) that n mRNA molecules are produced and the elapsed time is τ in OFF and ON state at time t, respectively, and we have: According to the relationship between the states of the system at time t and t + Δt, we obtain the following Chapman-Kolmogorov backward equations: The definitions of H on (τ) and S on (τ) are similar.  royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 221057 Next, we focus on steady-state distributions. If the stationary distributions of p off (n, τ, t) and p on (n, τ, t) exist (above simulation has verified this point), and are denoted by p off (n, τ) and p on (n, τ), respectively, equation (2.5) converts to the following stationary chemical master equation in the limit of small Δt and large t: @ @t p on (n,t) ¼ (n þ 1)r deg p on (n þ 1,t) þ r syn p on (n À 1,t) À (nr deg þ r syn þ H on (t))p on (n,t),

ð2:6Þ
with the integral boundary conditions p off (n,0) ¼ Ð 1 0 p on (n,t)H on (t) dt, p on (n,0) ¼ Ð 1 0 p off (n,t)H off (t) dt. Based on equation (2.6) with its boundary conditions, we use the binomial moment method [57] to calculate the mRNA stationary distribution P(n) ¼ Ð 1 0 [ p off (n,t) þ p on (n,t)] dt and its statistical characteristics. Binomial moments of the mRNA stationary distribution are defined as and Note that if let τ off r deg and τ on r deg be the rescaled random variables for OFF-state and ON-state dwell times, and r syn /r deg be the rescaled mean synthesis rate, the mean expression not only equals the product of the mean synthesis rate and the stationary probability of ON state, but also the product of burst size and burst frequency.
Finally, we consider two specific cases of the GTM. First, if the ON-state dwell time is exponential, i.e. f on (t) ¼ e Àt= ton h i = t on h i, the GTM and corresponding results reduce to the previous conclusions [46] (see the electronic supplementary material, note S3.1). Further, if the OFF-state dwell time is also exponential, i.e. f off (t) ¼ r off e Àr off t , the GTM reduces to the CTM. In this model, the mean activation and inactivation rates are r off = 〈τ off 〉 −1 and r on = 〈τ on 〉 −1 , respectively. Notably, the expression for the nth binomial moment of mRNA can be simplified as follows (see the electronic supplementary material, note S3.2): ð2:11Þ Using the reconstruct formula, we obtain the mRNA stationary distribution for CTM (see the electronic supplementary material, note S3.2), consistent with previous results [3,16].

Binomial moment-based inference
In this section, we will develop a statistical inference method based on binomial moments for GTM. First, we assume that the number of cells in the experiment is large enough to allow us to acquire a steady expression distribution of a specific gene. This assumption is reasonable in scRNA-seq data and has been widely applied to genome-wide studies [17,20]. Next, with the steady distribution of each gene from scRNA-seq data as a bridge, we develop a computation inference framework that uses an approximate Bayesian computation (ABC) approach [58] to infer reliable parameter sets for our GTM (see the electronic supplementary material, table S2).

Inference framework
We develop a statistical framework that combines the ABC approach to estimate the Bayesian posterior probability π(θ|y obs ) of the model parameter-vector (θ) given the observed scRNA-seq data (y obs ). The prior information related to θ is denoted as the prior distribution π(θ), which will be iteratively updated through computing the likelihood function p(y obs |θ) of the GTM. Using Bayes' theorem, the resulting posterior distribution is given by: p(ujy obs ) ¼ p(y obs ju)p(u) Ð Q p(y obs ju)p(u)du : ð3:1Þ Ideally, we can perform the inference methods relying on iterative likelihood function maximization. However, the approximation of the mRNA distribution of the GTM is computationally prohibitive, making it impossible to evaluate the likelihood function directly.
Alternatively, we resort to a 'likelihood-free' Bayesian approach by computing only low-order moments instead of the whole probability distribution according to the obtained binomial moments (equation (2.7)). Specifically, we use the sequential Monte Carlo ABC [59] (see the electronic supplementary material, table S2), a variant of ABC, to implement the statistical inference of our model. ABC allows us to accept the parameters that make the simulated data and the observed data sufficiently close in distribution and to estimate the posterior distribution π(θ|y obs ) of the parameters through numerous simulations. First, given a dynamic model (figure 3a), we sample a candidate parameter-vector θ from the prior distribution π(θ) (figure 3b), and simulate a dataset y model from the GTM. Then, we check whether the distribution of simulated data approximates the observed data y obs (figure 3d) by predefining three extra parameters: (i) summary statistics s(y) for sufficiently representing data; (ii) discrepancy metrics ρ( · , · ) for measuring the distance between summary statistics of observed data s obs and simulated data s model from the GTM (figure 3e); and (iii) threshold ε for controlling acceptable errors (figure 3f ). Note that the low threshold ε of ABC promises a good approximation of π(θ|y obs ), but also imposes a huge computational cost and low-rate acceptances. To avoid the difficulties of ABC in terms of computational power and convergence, we use the sequential Monte Carlo ABC.
Sequential Monte Carlo ABC adds a sequence of threshold values {ε 0 , ε 1 , …, ε T } satisfied the condition ε 0 > · · · > ε T ≥ 0, and thus constructs a sequence of the intermediate posterior distributions   Figure 3. Inference procedure. Given a dynamic model (a), the parameter θ is sampled from the prior distribution π(θ) (b), and then the theoretical binomial moment (c) is used to compute the summary statistic s model (e). The static single-cell snapshot data y obs (d) can be used to calculate the summary statistic s obs (e). The sampled θ is accepted by comparing whether ρ(s obs , s model ) is less than the threshold ε ( f ). As the output of the inference procedure, the posterior distribution π(θ|y obs ) of the parameters is obtained by Bayes' theorem (g).

Summary statistics
The inference procedure requires a summary statistic s(y) to reduce the high-dimensional data to lowdimension features to compare whether the distribution of simulated data y model from the GTM is close to the observed data y obs . Here, we choose six commonly used moment statistics for inference [60,61] that are important to characterize the shape of gene expression distribution as the summary statistics: (i) the mean value μ 1 is the most commonly used indicator in statistics, and it represents the average level of mRNA expression in scRNA-seq data; (ii) the noise strength is a measurement of the dispersion of the probability distribution, defined as m 2 =m 2 1 where μ 2 is the variance; (iii) the Fano factor is another statistic that measures the dispersion of a probability distribution relative to a Poisson distribution, defined as μ 2 /μ 1 ; (iv) the skewness is a description of the symmetry of the distribution, and it is defined as m 3 =m 3=2 2 , where μ 3 is the third central moment; (v) the kurtosis describes whether the peak of the distribution is abrupt or flat, which is defined as m 4 =m 2 2 , where μ 4 is the fourth central moment; and (vi) the bimodality coefficient can describe the bimodal distribution [62], which is usually a critical feature in a dynamical system. The values of the bimodality coefficient range from 0 to 1, and values greater than 5/9 may indicate a bimodal or multimodal distribution.
Crucially, we used the binomial moment approach which theoretically provides an efficient method to directly compute the theoretical summary statistics for a given parameter θ of the GTM. Precisely, we can calculate the central moments with the binomial moment: in which R(k,i,j) ¼ (À1) i k i S(k À i,j) with S(n,k) ¼ P k i¼0 (À1) kÀi k i i n being the Stirling number of the second kind. Therefore, the above summary statistics can be expressed by binomial moments as follows:

ð3:3Þ
It should be noted that we can extend the summary statistics to higher-order moments because our binomial moments can compute arbitrary high-order moment statistics. However, we emphasize that the practice of calculating summary statistics directly from theory rather than through simulations works well but is quite unusual in the context of ABC. As a result, the algorithm is not a true Bayesian approach, because it makes the acceptance and rejection for a set of parameters become a deterministic process, and the width of the posterior distribution obtained by the algorithm will be narrower than the true posterior distribution. We suggested that the mode of the posterior distribution obtained by the algorithm is credible, while the error of the width is dependent on the sample size (the larger the sample size, the smaller the width error) [58].
royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 221057 To assess the sensitivity of the summary statistics, we investigate the influence of the model parameters (k on , k off , r on and r off ) on the six summary statistics through the simulation algorithm of GTM (electronic supplementary material, table S1) and the theoretical results (equation (3.3)). Note that the parameter of ON and OFF dwell times show opposite tendencies of influence on the summary statistics (electronic supplementary material, figures S2 and S3). The results also show that the six summary statistics are sensitive to the parameters of dwell times, implying the rationality of statistics selection.

Discrepancy metrics
To eliminate the influence of absolute size between different summary statistics, we take the natural logarithm of the data instead of computing the Euclidean distance [36,63]: rðs obs ,s model Þ ¼ where s (i) is the ith component of the summary statistics vector. Note that the logarithm transformations of data do not change the data properties and correlation, but the scale is compressed. In addition, these transformations can make the data more stable and weaken the effects of collinearity and heteroscedasticity on the model.

Acceptance threshold
Generally, the acceptance thresholds {ε 0 , ε 1 , …, ε T } are usually determined by experience. However, the algorithm causes a waste of computational resources in this iteration if the default threshold is greater than the maximum distance obtained by sampling in the last iteration. Therefore, we adopt a strategy of adaptive acceptance thresholds to prevent this situation. For the first iteration, we use a large threshold ε 0 to accept 10 000 coarse parameter samples quickly, and then select the first 1000 parameter samples with the smallest discrepancy between s obs and s model as input for the next iteration. For the tth iteration, the acceptance threshold ε t is set to the median of the discrepancies of the results obtained from the (t − 1)th iteration. In detail, we set the hyperparameters: initial thresholds ε 0 = 1 and round number T = 5. With this setting, it takes about 75 s to run an example of synthetic data using an Intel(R) Core(TM) i7-6700 CPU @ 3.40 GHz.

Prior distribution
As required in Bayesian inference, we should set prior distribution π(θ), allowing for the initial parameter sampling θ in the inference procedure. In the GTM, we assume that the mRNA decay rate r deg = 1 and the dwell times in the ON state and OFF state are Gamma distributed, i.e. f off (t) ¼ r k off off t k off À1 e Àr off t =G(k off ) and f on (t) ¼ r kon on t konÀ1 e Àront =G(k on ), respectively. The parameter-vector is θ = [k off , r off , k on , r on , r syn ]. We set the prior distributions of k off and k on follow the uniform distribution U[0, 5] and the prior distributions of r off and r on follow the log-uniform distribution from interval [ − 1, 1] based on 10. In addition, the prior distribution of transcriptional rates r syn is U[0, 50].

Proposal distribution
In the inference procedure, the parameter θ Ã sampled in tth iteration is based on small perturbations around the parameter θ t−1 sampled from the distribution f t−1 (θ|ρ(s obs , s model ) ≤ ε t−1 ). Therefore, we use lognormal distribution LN(θ;θ t−1 , σ) as the proposed distribution for sampling all parameters in the tth iteration. In detail, we set the hyperparameters: the varience of lognormal distribution σ = 0.2 for all parameter.

Synthetic data
We first use synthetic scRNA-seq data from the GTM to verify whether the inference method can accurately estimate the burst kinetics of the gene expression. We apply the simulation algorithm for royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 221057 the GTM with given parameters (see the electronic supplementary material, table S1) to generate mRNA distribution. Figure 4a shows that the dwell times for OFF and ON states for the GTM are nonexponential distributions and have a bimodal distribution at a steady state, which has important significance in biological systems [64] and is a common observation in scRNA-seq data [65]. Having obtained 1000 samples by the inference procedure, we find that the posterior distribution of transcriptional rate r syn can accurately estimate the true parameter value (figure 4b). Importantly, we show that the dwell time's first-order moment information (mean) can be estimated accurately (figure 4c). In addition, the burst frequency and burst size can also be estimated accurately (figure 4d) (although the individual parameters are unidentifiable in large part of parameter space). In another example, the distribution of GTM is unimodal, and the burst kinetics can also be accurately inferred (figure 4f-h).

Experimental data
Next, we apply the inference procedure to mouse embryonic fibroblasts for each allele (C57 × CAST) [17] to estimate transcriptional bursting parameters on a genome-wide scale. This scRNA-seq data was widely used to investigate genome-wide transcriptional burst kinetics [20,66,67], which was sequenced based on smart-seq2 technology (unique molecular identifier counts were used) and contained 10 727 genes and 224 individual cells on each allele. For quality control, we filter out genes expressed in less than 50 cells and cells expressed in less than 2000 genes. In addition, we remove genes with mean expression levels below 2 to ensure high expression of the inferred genes. Finally, we combine the two allele expressions together to eventually form a single-cell matrix consisting of 2162 genes and 413 cells.
Interestingly, we observe that the genes with the same average expression level have a different combination of burst frequency and burst size, consistent with previous reports [17,24]. This result implies that gene expression may be regulated by diverse burst kinetic mechanisms (figure 5a). In addition, we perform genome-wide burst kinetics inference for the same data based on the CTM, using a maximum likelihood estimation method. We discover that the estimated burst frequency and burst size keep a high positive correlation between GTM and CTM ( p-value < 2.2 × 10 −16 ; figure 5b,c). Notably, we observe that the transcriptional burst kinetics inferred with the GTM would have lower burst frequency (figure 5b) and higher burst size (figure 5c) than those estimated by CTM on the genome-wide scale, consistent with the results for single genes in figure 2d,h. Moreover, the inference method with an alternative definition of the burst frequency (1/〈τ off 〉) also presents the same results (electronic supplementary material, figure S4). These results suggest that the GTM, as an extension of royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 221057 the CTM, provide a more flexible way to predict burst kinetics, and corresponding inference methods has the capability to perform genome-wide studies on scRNA-seq data.

Discussion
With the emergence of next-generation sequencing technologies, inferring gene expression burst kinetics on a genome-wide scale from static single-cell snapshot data is challenging in computational systems biology [4]. In previous studies [17,18], gene expression models used for inference and analysis, e.g. CTM, have long relied on Markov assumptions. However, increasing experimental data show that the dwell time of states is not simply exponentially distributed [24,27,37,49], leading to the failure of the Markov approximation. There is thereby an urgent need but it is still a significant challenge to develop effective methods for modelling, analysing and inferring non-Markov gene expression models.
In this paper, we have developed an inference method for inferring burst kinetics from scRNA-seq data based on GTM, which allows consideration of ON-OFF transitions with arbitrary dwell-time distributions. We demonstrated that the CTM could not estimate the non-Markovian burst kinetics, although it can well fit the gene expression distributions sometimes. We theoretically derived the analytical solution for arbitrary order binomial moments of GTM, which in turn enables us to calculate the statistics of mRNA. We developed the inference procedure to infer transcriptional burst kinetics using the summary statistic calculated by binomial moments. The results of the synthetic dataset show that our inference method can precisely estimate the burst frequency and burst size of the gene expression system as well as the average dwell time in ON and OFF states based on GTM. Furthermore, we performed a genome-wide burst kinetics inference on the mouse embryonic fibroblasts scRNA-seq data with the inference method. We found that the transcriptional burst kinetics inferred with the GTM would have lower burst frequency and higher burst size than CTM.
The GTM and the corresponding inference method are applicable to study burst kinetics on a genome-wide scale. First, the GTM is interpretable. The GTM, as an extension of the CTM, is a mechanistic model that considers the dwell times to be arbitrary distributions; and the model parameters, such as dwell times of OFF and ON states, transcription rates and degradation rates, are measurable from experiments [7,24]. Second, the GTM is solvable. The arbitrary order binomial moments of mRNA's distribution are theoretically derived. In particular, the model extends previous results. For example, the CTM is a special case of the GTM if f off (t) and f on (t) are set to exponential distributions. Third, the inference method for GTM is scalable. The approximation of the probability distributions of the stochastic model is often computationally prohibitive. With the theoretical results derived, we used a 'likelihood-free' approach by computing only low-order moments instead of the whole probability distribution. The efficiency of the inference method facilitates the extension of individual genes to genome-wide study. Finally, our inference method can learn gene expression information from static snapshot data without time-resolved data. royalsocietypublishing.org/journal/rsos R. Soc. Open Sci. 10: 221057 Our work opens up several avenues for further research. From a modelling perspective, the GTM simplifies biological processes in several aspects. The GTM only considers the generalized dwell-time distribution for OFF and ON states but does not consider the non-exponential waiting time for transcriptional and degradation processes. Many experimental data suggest that transcription (such as mRNA elongation, pause and release [68][69][70]) and degradation (such as mRNA senescence [31,71,72]) are also multi-step processes in cells. In addition, other biological processes that can affect burst kinetics should be considered in future work. For example, a recent study reveal that the inference of splicing dynamics can further investigate the transcriptional burst dynamics [73]. Also, another study has investigated that the post-transcriptional noise and different cell cycle phases would effect the transcription inference [74]. From a statistical inferring perspective, using GTM to interpret the experimental biological data has yet to be fully developed. First, the binomial moments obtained from our analysis are in the sense of a steady distribution (t → ∞) and not as a function of time t. Solving for the model's temporal solutions facilitates applying the GTM to time-resolved data. Second, the inference based on the steady-state distribution still suffers from the unidentifiability of the parameters, which may depend on the properties that different dwell-time distributions lead to the same static mRNA distribution. Third, a fraction of the variability in scRNA-seq data comes from technical noise [75], and it is a challenge to couple the technical noise into the GTM for inference, owing to the computational complexity of the model. Inspired by recent studies [20,76], we can address this issue in our future work by introducing sampling distributions.
Finally, we note that our approach is not limited to scRNA-seq data, but could also be useful for other kinds of single-cell data in which the probability distributions of mRNA can be estimated, such as smFISH data. We expect the GTM and corresponding inference method to facilitate an understanding of gene expression mechanisms from the enormous amount of biological data.
Data accessibility. Relevant code of algorithm and analysis for this research work are stored in GitHub: https://github. com/cellfate/BurstGTM and have been archived within the Zenodo repository: https://doi.org/10.5281/zenodo. 7512499. Experimental data of mouse primary fibroblasts were downloaded from [17]. Additonal data is available in the electronic supplementary material [77].